Assignment 1. Network visualization of terrorist connections

Assignment 1.1

Comment

The graph above shows a network of network of the individuals suspected of having participated in the bombing of commuter trains in Madrid on March 11, 2004, data clustered by bombing group and connection between nodes made base on the strength of tie between individuals, analyzing the graph manually we can observe 4 groups, the first one (the most obvious one) which is at the center of the graph, which consist of individuals participated directly in the bombing incident “colored in blue” and their connection with individuals who did not participate “colored in yellow” with a different tie of strength, the second cluster at the top of the graph which include only individuals who didn’t participate but the have a multiple connection ties. the third cluster is not obvious but to some extend we can confirm on it which is the cluster the left bottom of the graph, which consist of yellow group (not-participated) individuals with a limited connection with blue group (participated individuals). Finally, the fourth group which are the outliers the suspected individuals with no connection ties.

Assignment 1.2

Comment

The graph above highlights nodes with connection path length 1 or 2, data shows that individuals with high number of connections have better opportunity to spread the information in the network, graph shows that Jamal Zougam as the defendant with most connection ties with other individuals. Jamal Zougam Moroccan-Spanish 48-year-old convicted of being a perpetrator of the March 11, 2004, train bombings in Madrid, Spain, that killed 191 people and injured more than 1,800. According to counter extremism website Zougam who had previously been brought to the attention of authorities for his links to al-Qaeda members, placed at least one of the bombs. He also owned the shop where the mobile phones used to detonate the bombs were sold (https://www.counterextremism.com/extremists/jamal-zougam). The authorities found that Zougam have a strong connection with Imad Eddin Barakat Yarkas (alias Abu Dahdah), a Spaniard prosecuted for his involvement in the 9/11 attacks, which can be seen obviously in the network graph in the connection tie. The graph shows another suspect connection ties are similar to Zougam. Mohamed Chaoui who also accused in the attempt of Madrid. According to the Guardian Mr Chaoui’s name features on a phone tap of an alleged Spanish al-Qaida cell that may have helped prepare the September 11 plot in addition to that he also charged with multiple counts of murder, attempted murder, stealing a vehicle, belonging to a terrorist organization and four counts of carrying out terrorist acts. The BBC reported on 19th of March that Mohamed Bekkali also been areasted, our graph shows also who a strong connection tie between him and both Zougam and Chaoui.http://news.bbc.co.uk/2/hi/europe/3597885.stm Naima Oulad Akcha, a non-directly participated individual in Madrid bombing also arrested according to BBC as the only woman suspect, however her connection ties shows that she maybe on of the main participants on the attacks. Interestingly, four of the suscepts who were accused of placing the bomb (Abddenabi Koujam, Mohamed Oulad Akcha, Rachid Oulad Akcha, and Nasredine Boushoa) have a very small connection ties compared with other individuals, reports online did not shows much information about their roles in the operation.

Assignment 1.3

## Warning in cluster_edge_betweenness(net): At community.c:460 :Membership vector
## will be selected based on the lowest modularity score.
## Warning in cluster_edge_betweenness(net): At community.c:467 :Modularity
## calculation with weighted edge betweenness community detection might not make
## sense -- modularity treats edge weights as similarities while edge betwenness
## treats them as distances

Comment

Above graph represent clusters obtained by optimizing edge betweenness, the results show 5 main clusters, the main basic clusters centered in the middle of the graphs has been mentioned in Q1, however there was a non-defined cluster from step 1 which is the in between cluster at the top left of the graph. Outliers have been defined a one cluster in step 1, while in on this method each outlier been defined and single clusters.

Assignment 1.4

Comment

Above graph represent heatmap obtained by adjacency matrix representation to perform permutations and Hierarchical Clustering (HC) seriation method, the graph highlights obvious cluster at the top right, this cluster is similar to some extend to clusters obtained in step 1 & 3, however it’s not clear the connection ties between other suspects expect of those with high strength of connection ties for example, Jamal Zougam and Imad Eddin Barakat Yarkas.

Assignment 2. Animations of time series data

2.1

fig <- plot_ly(oilcoal, x = ~Oil, y = ~Coal, frame = ~Year, type = 'scatter', mode = 'markers',
               color = ~Country, size = ~Marker.size) %>% 
  layout(title = 'Coal versus oil',
         xaxis = list(showgrid = FALSE),
         yaxis = list(showgrid = FALSE))
fig
  • China and India’s oil and coal consumption appear to be strongly, positively correlated. The oil and coal consumption in these two countries seem to increase with a quite constant rate over time.

  • The oil and coal consumption in US and Japan seem to follow a similar pattern. However, the oil consumption in the US is much higher than in Japan.

  • Brazil has a very low coal consumption throughout the time period. The country’s oil consumption have increased over the time period though.

2.2

oilcoal %>%
  filter(Country == c("US", "Japan")) %>%
  plot_ly(x = ~Oil, y = ~Coal, frame = ~Year, type = 'scatter', mode = 'markers',
          color = ~Country, marker = list(size = 20)) %>% 
  layout(title = 'Coal versus oil',
         xaxis = list(showgrid = FALSE),
         yaxis = list(showgrid = FALSE))

Overall, the oil and coal consumption have increased with a quite constant rate in both the US and Japan throughout most of the time period. There are a few exceptions though:

  • In 1973 the oil consumption decreases in both countries due to the 1973 oil crisis.

  • In the late 70s and early 80s both US and Japan experience a decrease in oil consumption. A potential explanation is the Iranian Revolution that took place in 1978-79, and the following Iran-Iraq War.

  • In 2007 the oil consumption decreases in both countries due to the financial crisis of 2007-08.

  • In Japan, oil consumption started to decrease in the mid 90s. This is most likely a result of the economic stagnation commonly known as “the lost decades”, as well as a decreased population growth.

2.3

oilcoal$oil_p <- with(oilcoal, (Oil / (Oil + Coal)))

oilcoal_2 <- oilcoal
oilcoal_2$oil_p <- 0
oilcoal_2 <- rbind(oilcoal, oilcoal_2)

fig_3 <- plot_ly(oilcoal_2, x = ~Country, y = ~oil_p, frame = ~Year, 
                 type = 'scatter', mode = 'lines', color = ~Country) %>%
  add_lines(line = list(width = 40)) %>% 
  layout(showlegend = FALSE, xaxis = list(showgrid = FALSE))
fig_3

The advantage of the bar chart is that it makes it easier to compare the proportion of oil consumed between the different countries since the data is scaled. In the bubble chart it can be hard to spot small changes in countries with low oil and coal consumption.

A disadvantage of the bar chart is that it does not contain information regarding the total size of the oil and coal consumption. At a first glance, it might appear as if Brazil have a very high consumption. The proportions might also be harder to interpret for people who have limited knowledge of statistics.

2.4

fig_4 <- plot_ly(oilcoal_2, x = ~Country, y = ~oil_p, frame = ~Year, 
                 type = 'scatter', mode = 'lines', color = ~Country) %>%
  add_lines(line = list(width = 40)) %>% 
  layout(showlegend = FALSE, xaxis = list(showgrid = FALSE)) %>%
  animation_opts(frame = 1000, easing = "elastic", redraw = FALSE)
fig_4

There are no obvious advantages of using elastic easing.

The “bounce” makes no sense, and can be wrongly create the impression that the proportions of oil consumed fluctuates when they are not. The elastic easing can potentially also be a bit distracting.

2.5

oilcoal_wide <- reshape(data = oilcoal[,1:3], 
                timevar = "Country",
                idvar = "Year",
                direction = "wide") %>% 
  rename_all(~stringr::str_replace(.,"^Coal.",""))

mat <- rescale(as.matrix(oilcoal_wide[2:9]))

set.seed(2021)
tour <- new_tour(mat, grand_tour(), NULL)

steps <- c(0, rep(1/15, 200))

Projs <- lapply(steps, function(step_size) {  
  step <- tour(step_size)
  if(is.null(step)) {
    .GlobalEnv$tour <- new_tour(mat, guided_tour(cmass), NULL)
    step <- tour(step_size)
  }
  step
}
)

tour_dat <- function(i) {
  step <- Projs[[i]]
  proj <- center(mat %*% step$proj)
  data.frame(x = proj[,1], y = proj[,2], 
             Year = oilcoal_wide$Year)
}

proj_dat <- function(i) {
  step <- Projs[[i]]
  data.frame(
    x = step$proj[,1], y = step$proj[,2], variable = colnames(mat)
  )
}

stepz <- cumsum(steps)

tour_dats <- lapply(1:length(steps), tour_dat)
tour_datz <- Map(function(x, y) cbind(x, step = y), tour_dats, stepz)
tour_dat <- dplyr::bind_rows(tour_datz)

proj_dats <- lapply(1:length(steps), proj_dat)
proj_datz <- Map(function(x, y) cbind(x, step = y), proj_dats, stepz)
proj_dat <- dplyr::bind_rows(proj_datz)

ax <- list(
  title = "", showticklabels = FALSE,
  zeroline = FALSE, showgrid = FALSE,
  range = c(-1.1, 1.1)
)

options(digits = 3)

tour_dat <- highlight_key(tour_dat, ~Year, group = "A")

tour <- proj_dat %>%
  plot_ly(x = ~x, y = ~y, frame = ~step, color = I("black")) %>%
  add_segments(xend = 0, yend = 0, color = I("gray80")) %>%
  add_text(text = ~variable) %>%
  add_markers(data = tour_dat, text = ~Year, ids = ~Year, hoverinfo = "text") %>%
  layout(showlegend = FALSE, xaxis = ax, yaxis = ax)

tour
# Time series plot for Germany

oilcoal_wide %>%
  ggplot(aes(x = Year, y = Germany)) +
  geom_line()

By using set.seed(2021), two quite well-separated and compact clusters can be found at step 9.8. One of these clusters contain years prior to 1991, and the other cluster contain year from 1991 and onward. Germany appear to be an influential variable, and by looking the time-series plot over the country’s coal consumption it can be seen that the coal consumption decreases rather dramatically from about 1990 on onward.

Appendix

library(ggraph)
library(igraph)
library(visNetwork)
library(seriation)
library(plotly)
library(dplyr)
library(tourr)

##1##
link<-trainData
node<-trainMeta

node$id<-rownames(node)
node<-node[,c(3,1,2)]
colnames(node)[1:3]<-c("id","name","part")
node$part[node$part==1]<-"Participated in placing the explosives"
node$part[node$part==0]<-"Did't participated"
colnames(link)[1:3]<-c("from","to","weight")
link <- link[order(link$from, link$to),]
node$group<-node$part
node$label<-node$name
node$value<-strength(net)
link$width=link$weight
net <- graph_from_data_frame(d=link, vertices=node, directed=T)

visNetwork(node,link)%>%visLegend()%>%visOptions(highlightNearest = list(enabled = T, degree = 1, hover = T))%>%visPhysics(solver = "repulsion")

##2##
visNetwork(node,link)%>%visLegend()%>%visOptions(highlightNearest = list(enabled = T, degree = 2, hover = T))%>%visPhysics(solver = "repulsion")

##3##
node1<-node
net <- graph_from_data_frame(d=link, vertices=node, directed=F)
ceb <- cluster_edge_betweenness(net) 
node1$group<-ceb$membership
visNetwork(node1,link)%>%visIgraphLayout()%>%visOptions(highlightNearest = list(enabled = T, degree = 2, hover = T))
##4##
netm <- get.adjacency(net, attr="width", sparse=F)
colnames(netm) <- V(net)$name
rownames(netm) <- V(net)$name
rowdist<-dist(netm)
library(seriation)
order1<-seriate(rowdist, "HC")
ord1<-get_order(order1)
reordmatr<-netm[ord1,ord1]
library(plotly)
plot_ly(z=~reordmatr, x=~colnames(reordmatr), 
        y=~rownames(reordmatr), type="heatmap")

#### 2.1 ####

fig <- plot_ly(oilcoal, x = ~Oil, y = ~Coal, frame = ~Year, type = 'scatter', mode = 'markers',
               color = ~Country, size = ~Marker.size) %>% 
  layout(title = 'Coal versus oil',
         xaxis = list(showgrid = FALSE),
         yaxis = list(showgrid = FALSE))

fig

#### 2.2 ####

oilcoal %>%
  filter(Country == c("US", "Japan")) %>%
  plot_ly(x = ~Oil, y = ~Coal, frame = ~Year, type = 'scatter', mode = 'markers',
          color = ~Country, marker = list(size = 20)) %>% 
  layout(title = 'Coal versus oil',
         xaxis = list(showgrid = FALSE),
         yaxis = list(showgrid = FALSE))


#### 2.3 ####

oilcoal$oil_p <- with(oilcoal, (Oil / (Oil + Coal)))

oilcoal_2 <- oilcoal
oilcoal_2$oil_p <- 0
oilcoal_2 <- rbind(oilcoal, oilcoal_2)

fig_3 <- plot_ly(oilcoal_2, x = ~Country, y = ~oil_p, frame = ~Year, 
                 type = 'scatter', mode = 'lines', color = ~Country) %>%
  add_lines(line = list(width = 40)) %>% 
  layout(showlegend = FALSE, xaxis = list(showgrid = FALSE))
fig_3

#### 2.4 ####

fig_4 <- plot_ly(oilcoal_2, x = ~Country, y = ~oil_p, frame = ~Year, 
                 type = 'scatter', mode = 'lines', color = ~Country) %>%
  add_lines(line = list(width = 40)) %>% 
  layout(showlegend = FALSE, xaxis = list(showgrid = FALSE)) %>%
  animation_opts(frame = 1000, easing = "elastic", redraw = FALSE)
fig_4

#### 2.5 ####

oilcoal_wide <- reshape(data = oilcoal[,1:3], 
                timevar = "Country",
                idvar = "Year",
                direction = "wide") %>% 
  rename_all(~stringr::str_replace(.,"^Coal.",""))

mat <- rescale(as.matrix(oilcoal_wide[2:9]))

set.seed(2021)
tour <- new_tour(mat, grand_tour(), NULL)

steps <- c(0, rep(1/15, 200))

Projs <- lapply(steps, function(step_size) {  
  step <- tour(step_size)
  if(is.null(step)) {
    .GlobalEnv$tour <- new_tour(mat, guided_tour(cmass), NULL)
    step <- tour(step_size)
  }
  step
}
)

tour_dat <- function(i) {
  step <- Projs[[i]]
  proj <- center(mat %*% step$proj)
  data.frame(x = proj[,1], y = proj[,2], 
             Year = oilcoal_wide$Year)
}

proj_dat <- function(i) {
  step <- Projs[[i]]
  data.frame(
    x = step$proj[,1], y = step$proj[,2], variable = colnames(mat)
  )
}

stepz <- cumsum(steps)

tour_dats <- lapply(1:length(steps), tour_dat)
tour_datz <- Map(function(x, y) cbind(x, step = y), tour_dats, stepz)
tour_dat <- dplyr::bind_rows(tour_datz)

proj_dats <- lapply(1:length(steps), proj_dat)
proj_datz <- Map(function(x, y) cbind(x, step = y), proj_dats, stepz)
proj_dat <- dplyr::bind_rows(proj_datz)

ax <- list(
  title = "", showticklabels = FALSE,
  zeroline = FALSE, showgrid = FALSE,
  range = c(-1.1, 1.1)
)

options(digits = 3)

tour_dat <- highlight_key(tour_dat, ~Year, group = "A")

tour <- proj_dat %>%
  plot_ly(x = ~x, y = ~y, frame = ~step, color = I("black")) %>%
  add_segments(xend = 0, yend = 0, color = I("gray80")) %>%
  add_text(text = ~variable) %>%
  add_markers(data = tour_dat, text = ~Year, ids = ~Year, hoverinfo = "text") %>%
  layout(showlegend = FALSE, xaxis = ax, yaxis = ax)

tour

# Time series plot for Germany

oilcoal_wide %>%
  ggplot(aes(x = Year, y = Germany)) +
  geom_line()

Statement of Contribution

Simon and Mohamed devised the whole assignment together, the main conceptual ideas and codes outline. Mohamed worked out Assignment 1 (Network visualization of terrorist connections), and the report creation using r markdown, Simon worked out Assignment 2 (Animations of time series data).